The demo is intoxicating. You give a model 15 tools, write "you are an autonomous agent, achieve the goal," and watch it reason, call tools, and adapt. It looks like the future. On Twitter it gets 10,000 likes.
In production it gets you paged at 2am because a run has been looping for forty minutes, has spent $14 in tokens deciding whether to re-read the same file, and is now timing out the request that started it. I have built both kinds of system. This is the case for taking the steering wheel away from the model.
1. What "let the LLM decide" actually costs
When the model owns control flow, three failure modes are not edge cases, they are the default behaviour under load:
- Non-termination. Nothing guarantees the loop ends. The model can decide "I should double-check" forever. Without an externally enforced bound, "agentic" means "no base case."
- Latency variance. A task that takes 3 steps today takes 11 tomorrow because the model felt thorough. You cannot put a p99 SLA on a process whose step count is a sample from a distribution you do not control.
- Token blowup. Every extra step re-sends the growing transcript. Cost grows super-linearly with autonomy, and the variance is worse than the mean.
The root cause is singular: you put the control flow inside the prompt, which is the one component of your system that is probabilistic by construction. You would never accept a while loop whose exit condition is "usually correct." That is exactly what an autonomous agent loop is.
2. The model proposes; the code disposes
The principle I build on: move control flow out of the prompt and back into code. The LLM is demoted from "orchestrator" to "proposer." It suggests the next action as structured data. Deterministic code validates that proposal, decides whether and how to execute it, and owns the loop, the ordering, and the exit conditions.
Concretely, that means two hard boundaries around the model:
- The model never calls a tool directly. It emits a typed object describing the action it wants. In my Python stack that object is validated with Pydantic v2 before anything happens; a proposal that fails the schema is rejected, not executed.
- The model never decides the sequence of steps. A graph does.
3. A DAG, not a vibe
The orchestration in my agent SaaS engine is a directed acyclic graph of nodes. Each node names an agent; edges declare dependencies. The execution order is not something the model muses about, it is computed by Kahn's algorithm, in deterministic code, every time:
def topological_order(self) -> List[str]:
"""Kahn's algorithm - returns nodes in dependency order."""
indeg, adj = defaultdict(int), defaultdict(list)
for u, v in self.edges:
adj[u].append(v)
indeg[v] += 1
indeg.setdefault(u, 0)
q = deque([n for n in self.nodes if indeg.get(n, 0) == 0])
order, seen = [], set()
while q:
u = q.popleft()
if u in seen: continue
seen.add(u); order.append(u)
for w in adj[u]:
indeg[w] -= 1
if indeg[w] == 0: q.append(w)
return order if len(order) == len(self.nodes) else list(self.nodes.keys())
Each node carries a status that a state machine advances: pending → running → done | failed. The model fills in the content of a node (the code, the review, the summary). It has zero say over which node runs next. That is the difference between a system you can reason about and one you can only pray over.
4. Bounding the loop you do keep
Some loops are genuinely useful: a coder agent and an auditor agent iterating until the code passes review. The point is not to forbid loops, it is to forbid unbounded ones. In my multi-agent SDLC engine the pipeline is fixed (Researcher, Architect, then Coder and Auditor iterating a bounded number of times, then Documenter), and every role runs under a hard timeout ceiling written in code, not chosen by the model:
// D.3 - Hard per-role LLM call timeout. Bounds worst-case iteration cycle.
const ROLE_TIMEOUTS = {
Architect: 300_000,
Coder: 300_000,
Auditor: 60_000,
Researcher: 60_000,
Perceptor: 30_000,
};
On top of the timeout sits a session-level circuit breaker: once a provider exhausts its model chain, it is skipped for the rest of the run instead of being retried into the ground. Between the fixed pipeline, the bounded iteration count, the per-role timeout, and the breaker, the worst case is knowable. I can tell you the maximum number of LLM calls a run will make before it starts. Try saying that about an open-ended agent.
5. What you give up, honestly
This is a trade, not a free win. A DAG cannot discover a step you did not anticipate. If the problem genuinely requires open-ended exploration (research over an unknown space, a coding task whose shape is not known until halfway through), a fully autonomous loop will sometimes find a path your graph forecloses.
So the honest rule is the same as the autonomy ladder: climb only as high as you need. Use full autonomy for low-stakes, exploratory, human-in-the-loop work where a wrong turn is cheap and a human catches it. Use deterministic control flow for anything with an SLA, a budget, or a blast radius. Most production systems are a deterministic pipeline with one or two genuinely agentic steps inside, not a swarm of free agents.
What I Built
The DAG orchestrator lives in the Agent SaaS Boilerplate (FastAPI, Pydantic v2, a Workflow engine with topological scheduling and an AgentRegistry of pluggable agents, 22 tests). The bounded multi-agent pipeline with per-role timeouts and a session circuit breaker is the Agentic SDLC engine (Node.js, SQLite-durable blackboard). Both are built on the same conviction: the model is the most powerful component in the system and the least trustworthy place to put your control flow.